Separating sets of strings by finding matching patterns is almost always hard
نویسندگان
چکیده
We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others.
منابع مشابه
Finding the Smallest Turing Machine Using k log(n) Non-deterministic Guesses
Consider that we are given a number m and two disjoint finite sets of strings A and R. Does there exist a DFA with at most m states that accepts the strings in A and rejects the string in R? We refer to this problem as the inference problem for DFA’s and denote it by INFDFA. It was shown by E. Mark Gold in [4] that INFDFA is NP-hard. To the best of my knowledge, it is not known whether INFDFA r...
متن کاملDiscovering Best Variable-Length-Don't-Care Patterns
A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{ })∗, where Σ is an alphabet and is a wildcard matching any string in Σ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by ...
متن کاملOptimizing image steganography by combining the GA and ICA
In this study, a novel approach which uses combination of steganography and cryptography for hiding information into digital images as host media is proposed. In the process, secret data is first encrypted using the mono-alphabetic substitution cipher method and then the encrypted secret data is embedded inside an image using an algorithm which combines the random patterns based on Space Fillin...
متن کاملA New Family of String Classifiers Based on Local Relatedness
This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...
متن کاملOn the Varshamov-Tenengolts construction on binary strings
This paper is motivated by the problem of finding the largest single-deletion-correcting code for binary strings. The Varshamov–Tenengolts construction classifies binary strings into non-overlapping sets, the largest set of these is asymptotically the largest singledeletion-correcting code. However despite the asymptotic optimality little is known about the quality of the construction as a func...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 665 شماره
صفحات -
تاریخ انتشار 2017